Overview

Dataset statistics

Number of variables11
Number of observations699
Missing cells16
Missing cells (%)0.2%
Duplicate rows8
Duplicate rows (%)1.1%
Total size in memory60.2 KiB
Average record size in memory88.2 B

Variable types

NUM10
CAT1

Warnings

Dataset has 8 (1.1%) duplicate rows Duplicates
shape_unif is highly correlated with size_unifHigh correlation
size_unif is highly correlated with shape_unifHigh correlation
bare_nuclei has 16 (2.3%) missing values Missing

Reproduction

Analysis started2020-12-15 03:35:48.770714
Analysis finished2020-12-15 03:36:01.877910
Duration13.11 seconds
Software versionpandas-profiling v2.9.0
Download configurationconfig.yaml

Variables

sample
Real number (ℝ≥0)

Distinct645
Distinct (%)92.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1071704.099
Minimum61634
Maximum13454352
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:01.928802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum61634
5-th percentile411453
Q1870688.5
median1171710
Q31238298
95-th percentile1333890.8
Maximum13454352
Range13392718
Interquartile range (IQR)367609.5

Descriptive statistics

Standard deviation617095.7298
Coefficient of variation (CV)0.5758079404
Kurtosis257.7171591
Mean1071704.099
Median Absolute Deviation (MAD)104381
Skewness13.67532594
Sum749121165
Variance3.808071398e+11
MonotocityNot monotonic
2020-12-14T20:36:02.054541image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%) 
118240460.9%
 
127609150.7%
 
119864130.4%
 
46690620.3%
 
111611620.3%
 
107093520.3%
 
38510320.3%
 
129343920.3%
 
124060320.3%
 
127779220.3%
 
Other values (635)67196.0%
 
ValueCountFrequency (%) 
6163410.1%
 
6337510.1%
 
7638910.1%
 
9571910.1%
 
12805910.1%
 
ValueCountFrequency (%) 
1345435210.1%
 
823370410.1%
 
137192010.1%
 
137102610.1%
 
136982110.1%
 

thickness
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4.417739628
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:02.159583image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median4
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.815740659
Coefficient of variation (CV)0.6373713473
Kurtosis-0.6237154123
Mean4.417739628
Median Absolute Deviation (MAD)2
Skewness0.5928585327
Sum3088
Variance7.928395456
MonotocityNot monotonic
2020-12-14T20:36:02.234250image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
114520.7%
 
513018.6%
 
310815.5%
 
48011.4%
 
10699.9%
 
2507.2%
 
8466.6%
 
6344.9%
 
7233.3%
 
9142.0%
 
ValueCountFrequency (%) 
114520.7%
 
2507.2%
 
310815.5%
 
48011.4%
 
513018.6%
 
ValueCountFrequency (%) 
10699.9%
 
9142.0%
 
8466.6%
 
7233.3%
 
6344.9%
 

size_unif
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.134477825
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:02.311672image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation3.05145911
Coefficient of variation (CV)0.9735143395
Kurtosis0.09880288537
Mean3.134477825
Median Absolute Deviation (MAD)0
Skewness1.233136558
Sum2191
Variance9.3114027
MonotocityNot monotonic
2020-12-14T20:36:02.383509image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
138454.9%
 
10679.6%
 
3527.4%
 
2456.4%
 
4405.7%
 
5304.3%
 
8294.1%
 
6273.9%
 
7192.7%
 
960.9%
 
ValueCountFrequency (%) 
138454.9%
 
2456.4%
 
3527.4%
 
4405.7%
 
5304.3%
 
ValueCountFrequency (%) 
10679.6%
 
960.9%
 
8294.1%
 
7192.7%
 
6273.9%
 

shape_unif
Real number (ℝ≥0)

HIGH CORRELATION

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.207439199
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:02.462284image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q35
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)4

Descriptive statistics

Standard deviation2.971912767
Coefficient of variation (CV)0.9265686995
Kurtosis0.007010980047
Mean3.207439199
Median Absolute Deviation (MAD)0
Skewness1.161859179
Sum2242
Variance8.832265496
MonotocityNot monotonic
2020-12-14T20:36:02.535105image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
10588.3%
 
3568.0%
 
4446.3%
 
5344.9%
 
7304.3%
 
6304.3%
 
8284.0%
 
971.0%
 
ValueCountFrequency (%) 
135350.5%
 
2598.4%
 
3568.0%
 
4446.3%
 
5344.9%
 
ValueCountFrequency (%) 
10588.3%
 
971.0%
 
8284.0%
 
7304.3%
 
6304.3%
 

marginal_adhesion
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.806866953
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:02.618383image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.855379239
Coefficient of variation (CV)1.017283429
Kurtosis0.9879470695
Mean2.806866953
Median Absolute Deviation (MAD)0
Skewness1.524468091
Sum1962
Variance8.1531906
MonotocityNot monotonic
2020-12-14T20:36:02.697197image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
140758.2%
 
3588.3%
 
2588.3%
 
10557.9%
 
4334.7%
 
8253.6%
 
5233.3%
 
6223.1%
 
7131.9%
 
950.7%
 
ValueCountFrequency (%) 
140758.2%
 
2588.3%
 
3588.3%
 
4334.7%
 
5233.3%
 
ValueCountFrequency (%) 
10557.9%
 
950.7%
 
8253.6%
 
7131.9%
 
6223.1%
 

epithelial_cell_size
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.21602289
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:02.782974image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median2
Q34
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)2

Descriptive statistics

Standard deviation2.214299887
Coefficient of variation (CV)0.6885211836
Kurtosis2.169066423
Mean3.21602289
Median Absolute Deviation (MAD)0
Skewness1.712171802
Sum2248
Variance4.903123988
MonotocityNot monotonic
2020-12-14T20:36:02.861734image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
238655.2%
 
37210.3%
 
4486.9%
 
1476.7%
 
6415.9%
 
5395.6%
 
10314.4%
 
8213.0%
 
7121.7%
 
920.3%
 
ValueCountFrequency (%) 
1476.7%
 
238655.2%
 
37210.3%
 
4486.9%
 
5395.6%
 
ValueCountFrequency (%) 
10314.4%
 
920.3%
 
8213.0%
 
7121.7%
 
6415.9%
 

bare_nuclei
Real number (ℝ≥0)

MISSING

Distinct10
Distinct (%)1.5%
Missing16
Missing (%)2.3%
Infinite0
Infinite (%)0.0%
Mean3.54465593
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:02.947529image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q36
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)5

Descriptive statistics

Standard deviation3.64385716
Coefficient of variation (CV)1.027986138
Kurtosis-0.7988441354
Mean3.54465593
Median Absolute Deviation (MAD)0
Skewness0.9900156547
Sum2421
Variance13.27769501
MonotocityNot monotonic
2020-12-14T20:36:03.025269image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
140257.5%
 
1013218.9%
 
5304.3%
 
2304.3%
 
3284.0%
 
8213.0%
 
4192.7%
 
991.3%
 
781.1%
 
640.6%
 
(Missing)162.3%
 
ValueCountFrequency (%) 
140257.5%
 
2304.3%
 
3284.0%
 
4192.7%
 
5304.3%
 
ValueCountFrequency (%) 
1013218.9%
 
991.3%
 
8213.0%
 
781.1%
 
640.6%
 

bland_chromatin
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.43776824
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:03.109862image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile8
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation2.438364252
Coefficient of variation (CV)0.7092869798
Kurtosis0.1846213115
Mean3.43776824
Median Absolute Deviation (MAD)1
Skewness1.099969082
Sum2403
Variance5.945620227
MonotocityNot monotonic
2020-12-14T20:36:03.185662image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
216623.7%
 
316523.6%
 
115221.7%
 
77310.4%
 
4405.7%
 
5344.9%
 
8284.0%
 
10202.9%
 
9111.6%
 
6101.4%
 
ValueCountFrequency (%) 
115221.7%
 
216623.7%
 
316523.6%
 
4405.7%
 
5344.9%
 
ValueCountFrequency (%) 
10202.9%
 
9111.6%
 
8284.0%
 
77310.4%
 
6101.4%
 

normal_nucleoli
Real number (ℝ≥0)

Distinct10
Distinct (%)1.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.86695279
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:03.268440image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q34
95-th percentile10
Maximum10
Range9
Interquartile range (IQR)3

Descriptive statistics

Standard deviation3.053633894
Coefficient of variation (CV)1.065114816
Kurtosis0.4742686755
Mean2.86695279
Median Absolute Deviation (MAD)0
Skewness1.422261257
Sum2004
Variance9.324679956
MonotocityNot monotonic
2020-12-14T20:36:03.335293image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
144363.4%
 
10618.7%
 
3446.3%
 
2365.2%
 
8243.4%
 
6223.1%
 
5192.7%
 
4182.6%
 
9162.3%
 
7162.3%
 
ValueCountFrequency (%) 
144363.4%
 
2365.2%
 
3446.3%
 
4182.6%
 
5192.7%
 
ValueCountFrequency (%) 
10618.7%
 
9162.3%
 
8243.4%
 
7162.3%
 
6223.1%
 

mitoses
Real number (ℝ≥0)

Distinct9
Distinct (%)1.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.589413448
Minimum1
Maximum10
Zeros0
Zeros (%)0.0%
Memory size5.5 KiB
2020-12-14T20:36:03.407605image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1
Q11
median1
Q31
95-th percentile5
Maximum10
Range9
Interquartile range (IQR)0

Descriptive statistics

Standard deviation1.715077943
Coefficient of variation (CV)1.07906344
Kurtosis12.65787807
Mean1.589413448
Median Absolute Deviation (MAD)0
Skewness3.560657844
Sum1111
Variance2.941492349
MonotocityNot monotonic
2020-12-14T20:36:03.476417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=9)
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
10142.0%
 
4121.7%
 
791.3%
 
881.1%
 
560.9%
 
630.4%
 
ValueCountFrequency (%) 
157982.8%
 
2355.0%
 
3334.7%
 
4121.7%
 
560.9%
 
ValueCountFrequency (%) 
10142.0%
 
881.1%
 
791.3%
 
630.4%
 
560.9%
 

class
Categorical

Distinct2
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Memory size5.5 KiB
2
458 
4
241 
ValueCountFrequency (%) 
245865.5%
 
424134.5%
 
2020-12-14T20:36:03.572136image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Frequencies of value counts

Unique

Unique0 ?
Unique (%)0.0%
2020-12-14T20:36:03.635963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:03.709766image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category

Length

Max length1
Median length1
Mean length1
Min length1

Interactions

2020-12-14T20:35:51.851506image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:51.981182image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.089891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.197049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.303763image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.413475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.523152image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.629891image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.739591image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.852312image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:52.962020image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.068735image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.160489image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.255211image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.345991image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.432736image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.518534image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.658160image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.745926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.832696image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:53.917468image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.014208image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.103963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.191728image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.277503image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.363273image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.449044image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.533817image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.626567image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.714334image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.811049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.911803image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:54.997577image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.084347image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.173109image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.257881image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.343651image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.428674image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.517437image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.607196image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.698951image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.806662image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.898417image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:55.990172image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.081899image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.174678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.265407image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.351207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.437973image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.529700image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.623654image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.731366image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.899940image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:56.995684image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.089460image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.185177image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.279924image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.371678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.464430image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.554190image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.646941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.749666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.843459image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:57.945187image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.048910image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.152632image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.249373image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.345117image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.441857image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.531617image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.618385image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.716123image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.807878image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.898663image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:58.987425image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.073195image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.162928image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.253685image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.340452image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.434202image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.525982image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.627710image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.716472image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.806232image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.898014image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:35:59.987746image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.079501image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.171255image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.264007image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.354764image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.441532image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.538273image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.626039image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.713804image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.802566image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:00.891619image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:01.062156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:01.150919image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:01.242675image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:01.334400image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2020-12-14T20:36:03.797558image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2020-12-14T20:36:03.988021image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2020-12-14T20:36:04.177514image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2020-12-14T20:36:04.366009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

2020-12-14T20:36:01.494970image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:01.693432image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2020-12-14T20:36:01.787181image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Sample

First rows

samplethicknesssize_unifshape_unifmarginal_adhesionepithelial_cell_sizebare_nucleibland_chromatinnormal_nucleolimitosesclass
01000025511121.03112
110029455445710.03212
21015425311122.03112
31016277688134.03712
41017023411321.03112
51017122810108710.09714
610180991111210.03112
71018561212121.03112
81033078211121.01152
91033078421121.02112

Last rows

samplethicknesssize_unifshape_unifmarginal_adhesionepithelial_cell_sizebare_nucleibland_chromatinnormal_nucleolimitosesclass
689654546111121.01182
690654546111321.01112
69169509151010545.04414
692714039311121.01112
693763235311121.02122
694776715311132.01112
695841769211121.01112
69688882051010373.081024
697897471486434.010614
698897471488545.010414

Duplicate rows

Most frequent

samplethicknesssize_unifshape_unifmarginal_adhesionepithelial_cell_sizebare_nucleibland_chromatinnormal_nucleolimitosesclasscount
03206753352310.071142
1466906111121.011122
2704097111111.021122
31100524610102810.073342
41116116910101108.033142
51198641311121.031122
61218860111111.031122
71321942511121.031122